Fix async storage retry response drain ordering#49581
Conversation
|
Thank you for your contribution @arnabnandy7! We will review the pull request and get back to you soon. |
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
Fixes an async retry ordering issue in RequestRetryPolicy to ensure retryable storage responses have their bodies drained (subscribed) before the response is closed, preventing hangs with Reactor Netty connection recycling during parallel downloads.
Changes:
- Reordered async retry handling to drain the response body before closing the response.
- Added a regression test validating the drain-vs-close ordering in the async retry path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/storage/azure-storage-common/src/main/java/com/azure/storage/common/policy/RequestRetryPolicy.java | Adjusts async retry flow to drain the body before closing the response. |
| sdk/storage/azure-storage-common/src/test/java/com/azure/storage/common/policy/RequestRetryPolicyTest.java | Adds an async regression test covering drain-before-close behavior. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
ibrandes
left a comment
There was a problem hiding this comment.
thanks for this fix! the doFinally ordering and Mono.defer are both the right choices here and the test approach of switching body behavior at subscription time cleanly isolates the race. a few things worth addressing before merging:
- your changes were missing onErrorResume -- without it, a drain error bypasses the retry entirely and surfaces to the caller. adding
.onErrorResume(drainError -> Mono.empty())betweendoFinallyandthencloses this gap. - another test case should be added for but the skipped-retry interleaving (
Flux.error(ISE)afterclose()) behavior - a couple of nits -- the
retryResponsealias is unnecessary;FAST_RETRY_OPTIONSwith a 1 ms delay makes theStepVerifiertimeout more meaningful as a hang detector.
please re-request review when you've addressed the open comments. also, feel free to add an entry for the fix in azure-storage-common\CHANGELOG.md under ### Bugs Fixed (line 9) -- happy to add it myself though if you'd prefer to leave that to us.
|
/azp run java - storage - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Fixes #49507.
This PR fixes an async retry ordering issue in
RequestRetryPolicywhere a retryable storage response body was captured, the response was closed, and only then the body was subscribed to for draining before retrying.With Reactor Netty, closing the response first can discard the inbound stream and recycle the connection before the drain subscription is attached. In parallel blob downloads, that can cause a retryable chunk response, such as HTTP 500, to hang indefinitely instead of retrying or failing.
The retry path now drains the response body before closing the response, and closes it afterward via
doFinally. Responses without a body are still closed immediately before retrying.A regression test was added to verify that the async retry path subscribes to the retryable response body before closing it.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines
Tested with:
mvn -f sdk\storage\azure-storage-common\pom.xml -Dtest=RequestRetryPolicyTest test